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A Look at the Mosaic of Educational Ev^uatiofl and Accountability^ ^ 



The terme "evaluation" and '^accountabili^" are becoming so well 
ingrained in educational parlaiice that it would be easj^ to assume that the 
corresponding activities are well understood by educators and well entranchad 
in educational practice. Phrase s such as '^program accountabUify*' md 
"using evaluation to support decision making^- appear more su^d more 
frequantly in educators^ writings and conversations Ln whieh they describe 
school activities* 

A closer look reveals that what is observed may be largely a form 
of semantic orthodo^cy, IDemands to make educational systems accountable 
to their publics are proliferatmg at a rapid pace* Yet, as Glass (1972) haa 
noted^ most of tha activities which masquerade as forms of accountability 
fail to result in real accountability. More and more legislative bodies ara 
authorlEing funds for the express purpose of ev^uatlng educational programs 
to determine their affectiveness. Yet many of the resulting systems fall 
to hold the schools accountable at all and deteriorate Into mMidated infor- 
mation management or testing systems which add little if anything to the 
quality of educatlonp Verbal statements about evaluation and accountability 
are abundant I but genuine evaluation of educational programs is infrequent. 

One reason for the scarcity of good examples of evaluation aiid 
accountability in education probably lies In tiha fact that school practitioners 

^ This paper is based on the author* s script for two programs on 
evaluation in a 1973 University of Colorado stete-^wide educational television 
series entltied ''Educational Accountability. " 



have had little usable guidance in how to facilitate or conduct inch activities. 

\ 

TikB evaluation Jitarature is badly fragmented Into unrelated pieces and is as 

\ ■ . ■ ■ ■ 

difficult to synthesize as it is to make a meaningful picture from a random 
handful of pieces to a jigsaw puazle* Looking at the Individual pieces is 
little more helpful, for the level of discourse in individual writings is often 
aimed at fellow evaluation theorists more th^ at schoolmen, thereby 
communicating a great deal of detail about a topic which lacks a larger 
context within which it could be usefuL Working under this handicap, busy 
practitioners can hardly be faulted for not ej^ending the necessaiy time to 
try to develop a clear picture from the current evaluation llteratura* 

The puipose of this paper is to examine briefly a few major isoiicepts 
about evaluation and accountabill^ and relate them to one another In a way 
that will provide a simple portrayal of part of the mosaic of educational 
evaluation and accountability* Two caveats should be stated at the outset. 
Firstp this paper is not intended for evaluation specialiBts or schoolmen 
well versed in evaluation theory and practice* It is primarily intended for 
the practitioner who wants a brief summaiy of some of the more ImportMt 
notions about evaluation which have been presented during the past several 
years. Second, the basic thread which will run through this paper is eValua- 
tlonp with accountability playing only a supporting, Illustrative role. 

An Attempt at Definition 

Evaluation is closely related to several other terms with which it 
is often aisociated and generally coniksed—terms like research, assessment, 
measurement, and, of course, accountabUItyy These terms should be 
separated from one anotherj since the me^Mng we attach to words often 



imflueiioes what we do (Glass and Worthen, 1971), It is not my intent here 
to engage in the usual academic activity of defining one term by use of toother 
tiiat is equally arcane* Those who find dictlona^ style definitions helpful 
should read earlier writings by Wardrop (1972) or Worthen and Seders (1973)* 
I would prefer to use some very simple-minded examples to illustrate differencee 
in five interrelated but different concepts; measurementj assessmentt evaluatloni 
accountability^ and research, 

A high school invitational pole vaulting meetj in which a number of boys 
participate, can serve as an example. The performiuice of the boys could 
be viewed in relation to each of the five concepts, as illustrated below. 

Measurement ajiswers the question, "How high did each boy vault 
successfully?" It is the simple act of determining the maximum height at 
which each boy cleared the bar* 

Assessment answers ^e question, "How well did each boy meet the goal 
or objactive he (or hJs coach) set for him?" Assessment comprises three 
activitlesi (1) decisions about goals or objectives; (2) measurement of how 
well objectives are attained; and (3) a summaty of the measurement Infor- 
mation in relation to the objectives or to relative performance* Pursuing 
our example, a minimum objective i3 reflected in the decision to set the 
bar at 10 feet for the Initial round since it is e^^cted all the boys 
clear that hel^t. Individual objectives are reflected In decisions to tiy to 
exceed a height of 18 feet or break the record of 16 feet. Decisions about 
how to measure attainment of objectives are evidenced in established rules 
that height of jumping should be measured in feet and Inches and a miss 
occurs when a boy knoclcs the bar off three consecutive times at a given 



height. Measurement occurs v^hen those rules are appliad. Ite statement 
that all of the boys cleared 10 feet but only two of them were able to clear 
15 feet is a brief summary of the measurement information In relation to 
the objectives or to relative performance* 

Evaluation answers questions like "Given a standard such as height, 
which boy is the best pole vaulter?" *'Over^ls did the use of bamboo poles 
or steel poles result In greater heights In vaulting?" "Which type of pole 
broke most often during a vault?" "Did the training program used by a 

ifc . ■ .... 

particular high school produce satisfactory resultB?" Evaluation includes 
(1) determining what measures and standards should be used to Judge per- 
formance (e,g*p height of highest Buccessful vault, consistency of sucoesses 
without a miss, form), (2) deciding whether the standard should be relative 
(e.g. i Qoinpared to other boys) or absolute <e*g. * a state -wide minimum 
height for qualiJying), (3) collecting the relevwt information through 
measurement or other meanSi and (4) applying the standard In determining 
merit or effectiveness. 

Accountability ajiswerg the question * 'We re the coaches aiid athletle 
programs responsible for preparing the boys for the meet successful in 
helping their boys reach expected performance levels and/or win the meet?" 
/ Research answers questionfi} like 'What are the characteristics of 

eteel poles or croBS-handed grips which malce them superior to their 
counterparts?'^ "Why does athletic program A produce better results 
than program B?" to the pole vaulting example, the primary function of 
research would be to determine w^ certain performance levels were 
reached* 8 



These examples oversimplify the five concepts but may help to illustrate 
major differences among them and reveal why my focus in this paper will be 
primarily on evaluation, gecondarlly on accountabilityi and not at all on 
research, assessment, or measurement, Reseaich is clearly an enormously 
complex undertaking which goes far beyond simple evaluative findings (e* g. , 
Program A is better than Program B on a particular criterion) to tiy to fix 
the causes for those findings* The compleK aGtlvities inherent In such pursuit 
of causal explanations makes genuine research a luxury few school districts 
can afford. Assessment has many of the trappings of evaluation and shares 



with it many common activities, but it lacks evaluation's ej^licit Judgments 
of worth or effectiveness. Assessment generally Is used to depict sometMng 
in detailj looking at it through a frame established by the goals or objectives, 
but It stops short of Judging whether the resulting portrait Is good or badj 
tasteful or tasteless. Useful as assessment Is, going bayond It to a complete 
evaluation of an eduoatlonal program is critical to attempts to Improve school 
programs. Measurement is simply a proaess for collecting the data on which 
evaluative judgments will be made* It is a key tool in evaluation but hardly 
suffices in and of Itself. Accountability Is a broad concept which goe a beyond 
evaluation but obviously depends on evaluation as one of its central steps, 
m^lng a discussion of evaluation essential to any discussion of aceountabillty, ^ 



3 Glass, In a statement quoted later in this paper, argues that evalu- 
ation is not an essentia Ingredient in acco^tability* In the broad and rela-* 
tively pure type of accountability he describes in his writings (Glass, 1972), 
the argument Is valid. However, I fall to see much utility in a system which 
merely served to disclose performance without also providing standards by 
which the performance could be judged. 
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If thie attempt to sort evaluation out from near "look-alike" terms 
leaves some of ttie distinctions blurred, they hopefully will become more 
clear as the concepts of evaluation and accountabili^ are discussed in the 
remainder of this paper* 

Several Views of Evaluation 

Until 1965, the term evaluation was generally used In education as a 
synonym for grading, Litfle real evaluation oi educational programs per se 
had taken place, ^ With the passage of the Elemental and Secondary Eduqation 
Act and its accompanying mandate that all Title I and H projects must be ev^u- 
ated, a more generad concern for ev^uation was registered— in soma ways a 
preview of the accountability movement to come. During this period^ many 
prominent methodologists and educationists turned their attention to how 
educational programs should be evaluated. Many evaluation "models" 
emerged, ranpng from near prescriptions for how evaluations should be 
carried out to presentations of a few factors which should be considered in 
any evaluation. These models have appeared in the literature and, in the 
absence of a good empirical base for determining the best way to evaJuate 
educational programs, have greatly influenced the present practice of 
evaluation.^ These models have been reviewed elsewhere (Worthen and 
Sanders, 1973) and it is not my Intent to summariEe them here* Instead, 

ft 

■ Obviously, there are notable exceptions to this statement such as 
the Eight Year Study i but they are clearly the exception rather than the rule* 
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The impact and general shortcomings of the models hias been 
discussed previously (Worthen, 1972) and will not be repeated Itere. 



I would like to quote statements made by some of Uiese leading thinkers in 
evaluation which surnmarize their views of what evaluation is and provide 
a backdrop against wlilch to discuss evaluation and accountability in further 
detail.^ 



Ralph W. Tyler 



Educational evaluation is finding out what students have 
learned in their echool work and which of them are having diffi- 
culty in learning* For example, in the primary grades how many 
children have learned to readV to add, subtract, multiply and 
divides Bjid to work cooperatively with other children? Which ones 
are having difficulty In learning these things? Have they learned 
other things of value? Or, as another example, in the high school 
how many youth have learned to write clearly, understand the basic 
principles of our Constitution, and can e^laln the processes of 
Nature? Have they learned other things of value in the high school? 
Which are having difficulty? AH of us, whether in education, 
business, health services, or other fields need to know how we are 
doing. Are we really attaining our purposes and to what extent? 3;^^ 
Are we having difficulties? What are they? Are there improvements 
that need to be made? Educational evaluation is import^t and 
necessary both to help the teacher End to give the public a better 
notion of our educational achievements and where pur problems lie 
that require thoughtful attention, (Tyler, 1973)* 



5 The statements quoted herein are talcen from audiotaped statements 
originally included in the television program referenced earlier* The charge to 
the persons quoted was to prepare a brief (one to three minute) statenient com- 
pleting the phrase ^^E valuation is , . , , " for a practitioner audience. Readers 
should keep in mind the severity of the constraints imposed by that charge. 
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W, James Fopham 



When people evaluate, they make an appraise of some kind* 
I^ey mdce an estimate or judgment of the worth of some phenomenon, 
and In educational evaluation we are concerned with making appraisals 
of tile worth of educational ente^rises. The major actlvitiei of edu-- 
catlonal evaluators cm be focused in three genen l arenas. Two of 
these have to do with specification of tiie intentions we want to accom- 
plish in our educational endeavore. Many people refer to theee as 
"statements of instructional objectivee*" 

The first kind of evaluation we have to engage in attends to 
the objectives themselves. Which objectives are really worthwhile 
pursuing? Which are wortii accomplishing even if we could? To 
evaluate educational objectives^ we discover tiiat the more precisely 
they are articulatedi the more rationally we can decide upon which 
objectives should or should not be pursued in our schools. We are 
be^nnlng to devise ways whereby students^ teachers, scholars, citi- 
zens, eveiyone who has a stake in the educational enterprise, can 
appraise the wortt of educational objectives. The more precisely 
those objectives are e^^llcated, Hie better the evaluative judgments 
can be* 

A second focus of education^ evaluators concerns the assess- 
ment of the degree to which otijectlves have teen achieved. Once we 
decide upon the really worthwhile goals, a second task is to discern 
whether the objectives have been realized as a conse^ence of our 
educational endeavors. Once more we find that an ej^lioitly stated 
objective, stated in terms of measurable learner behaviors, permits 
more readily such assessment* We can discover whether such objac-- 
tlvee have been achieved* And educational evaluators are very much 
concerned with discovering the degree to which objectives have been 
achieved. 

A third focus of educational eviduation these days is upon 
Judging all of the effects of Instructional endeavors, both those which 
were Intended (as reflected by the objectives) and those which were 
not anticipated at alL In other words, rather than being attentive 
only to what Intentions the Instructional designers had at tihe begiimlng 
of instruction, we should attend to all the results of an Instructional 
endeavor, those that were anticipated as weU as those which were 
unforeseen. (Popham, 1973). , 



Robert E. Stake 

Let us look more carefully at the notion of evaluation. To me, 
it Is mostly a matter of saying something is good or bad, or saying 
how good It is or how bad it is. In order to communicate effeetlvely 
wltii other people when ev^uatlng, we have to talk about what It is 
that we are evaluating, and that may take a great many words and a 
few pictures. It may take many different displays to Indicate to other 
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D aniei L_* Stifflcbeam 



What is ev^uation? _What is it for? What que etions does it 
address? Who should do It? How should they do it? And by what 
standards should their work be judged? PersonsV responses to these 
questions can reveal whether they have thought very much about ev^u- 
ation and if so^ what their conceptualizations are. Brieflyj here are 
my responses to these six questions* 

1. Evaluation is the ascertainment of merit. 

2. Evaluation seizes both decision making and accountablU^. 

. 3. Evaluation addresses questions about goals, designs, 
procedures, and results. 

4* Evaluation for decision making CM appropriately be 
performed by an agency staff, but external personnel 
should be involved in evaluation for accountability. 

5. The process of doing evaluation involves delineating the 
evaluation requirements, obtaining the relevant information, 
and providing the obtained information to the appropriate 
audiences. • ' 

6. Evaluative information should be Judged for its technical 
adequacy, its utility, and its worth compared to its cost. 

As a final point, evaluation phould serve not only to prove the 
worth of programs, but also to improve them. (Stufflebeam, 1973). 



^ Stake's statement was exceipted from an audiotape, The Teacher ^d 
AccountabUity/v produced by the VIMCET Corporation ^d reproduced here 
with their permission. . 
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EvaluaUon is the systematio^^^ objective determinatipn of the 
merit or worth of somethlng.>^ to this ; ^ 

something is usually an edu^ perhaps V 

■ educational personnels But having a definition tells you little abo 
to do it. The ^ay; I conceptualize doing it, a " 

• be placed on the comparative element in evaluation. I think evaluation : v ; . 
is very rarely of any interest u^ees it tells you somettdng about how 
weU the thing you are looking at did by comparleon^ 

that are available or could be set up at comparable cost. Soi to me ; ^ > ^ ■ 
the main task in educational evaluation is identifying the most Important - 
aomparlsons to he made-^-the critical c6mpetitors---and then prbdee 
to document toe comparisons on the various dimensions of interest to 
the respective audiences of the evaluation* Correctly done, this approach 
avoids a vaiy seribus flaw in a great deal of educational evaluations that 
of regarding everything as necessarily to be evaluated In terms of the 
goals of its developer or its designer. . . . 

To sum it up, the evaluation approach which wU^^ . 
to educational practice is (1) the constant Juxtaposition of the item to 
be evaJuated with various critical co^etltors and (^^ 
analysis of the dimensions of difference of perf 
with respect to the needs of the target population rather t 
raspeot to the goals of tiie producer* That* s evaluation* (Scriven, 1973)* 



Gene V Glass . 

Evaluation is the asseasmenf; of ttie wortii or value of a thing. 
But rattier than talking about what evaluation is, I jvmnt to say something 
about what evaluation Is not* Think of the word BQATS as a mjiemonlc ■ 
reminder of the tMngs that look like evaluation, but realiy ar^ not* 

Budgeting. Budgets are usefiud. Eveiy school dlstri 
one and probably does. Somehow, perhaps because Program Planning and 
Budgeting was niandated along with accountabilify In Colorado, drawing 
up budgets has become confused with being acGouatable and, worse yet, 
with e v^uatlng sohool programs* Budgets can be based on nothing more 
than whims, fads, and poor judgment* When you get ri^t down 
superintendent drawing up a budget is not necessarily evaluating or .^^^^^ 
judging SAytUng. He is only draw m 

Objectives . Objeclives, goals, and Intentions are basically the 
saine thlngsl Stating bbjectlves is sometlnaes the first step in ev^uation, 
sometimes not. Under some circumstances the evaluator need not concern 
himself with the program objectives at aU. After aU, intentions are o^y 
intentions. It is the value of what eventuates from a school prop*am that 
counts, • • • 

. AccQuntability. In my opinion, the principle of disclosure is at 
the heart of ti^e accountability. The accouM school discloses Its ^ / ; ] 
goals, decision making procedures, flnanoial affairs, and Its accomplish* J 
ments, good and bad.^ I would regard a schoor as acting accountably if it ^ 




merely diecloged such facti to the public, even if it could not accomplish 
the more difficult job of turning the facts into v^iie claims. It is not 
necessaryVto evaluatie to be acaountable. 

Testing, Millions of standardiied tests are given every year in 
schools and, on balance, I suspect they are worth the cost. Tests are 
often used in program evaluation, but there are many steps between 
testing children and validly Judging the worth of the pro-ams they take 
part in* The methodology of evaluation Is a guide for use in moving 
from evidence. Including test scores, to Judgments of value. The tech- 
noloffir of tasting has little to do with deriving value Judgments, so 
although your school may administer a great many teste /It does not 
follow that it is doing a lot of evaluation. 

As for the S in BOATS* I prefer to let it dangle on the end, I 
would not want things to work out too neatly, Tliey never do in reali^, 
neither in teaching nor evaluation, (Glass, 1973), 



As the statements above show, tiieir authors differ somewhat in their 
views of what evaluation is and how it should be carried out, m^dng it 
Impossible to combine their notions Into aJE^r single prescription for how to 
evaluate a particular school program. However * there are some common 
issues to which each of these evaluate rs have attended in one way or another 



and their divergent suggestions provide a set of alternatives from which 



practitioners can select in conducting a program evaluation* ^ to the 
remainder of tills paper, I will present a few simple concepts about evalu-- 
ation (and to a lesser extent accountability) and discuss the alternatives 
which cKiat for schoolmen as ttiey approach each concept* 



Some Basic Evaluation Concepts 
The first topic I would like to discuss is the relationship of evaluation 
to decision miJdng in the schools, f 



Evaluation for decision making . The basic notion here Is very simple 

and has been well artlciaated by Stufflefc^a^ his writings (elg/, Stu^^^^ 
at al > i * 1971). Put simply, the Idea is that evaluation exists to provide Infor- ; 
mation to admlnlitratorB so they can m^e more Intelligent decisions about 
tte programs they administer. Administrators obviously must m^e decisions 
about edueational programs I whatter or not tihey have adequate information 
about which of the alternatives they are choosing from is best for tiielr 
puipose* The role of evaluation as it relates to decision maMng is to 
examine each alternative critically ^d ma^ Judpment about its worth for 
the pui^ose the administrator , has in mind. Such collection of evaluative 
data to help make Intelligent decisions is the major use for evaluation as I 
see It relating to educational programs, / 

Evaluation of dealsion making. Hus is the process of looking explicitly 
at how the administrator goes about reaching a dealsion. Did he consider 
alternatives?; Did he look at all the data? Was he unduly Influen by 
political considerations ? This is a specialised use of evaluation where 
decision making is merely the object of the evaluation— an interesting and 
Important use of evaluation but not the major one of concern in this paper* 

Evaluation of the Impact or results of decision m^lng . Accountability 
legislation in several states has led to titila use of evaluation. Given that a 
. particular decision has been made^ what Impact has that decision had on the 
quality of educatiohal programs ? The authors of tte 1971 Colorado accounta- 
bility act evidently had this In mind when they stated that It was necessary 
to establish a . . mews for determining whether decisions attactlng the 
educational process are advancing or impeding student achievement" (Colorado 



n ; . . 

Senate BUI No, 33, p. 2), It seems strange that they would press to : 
evaluate the impact of decisions without first seeing that the decision maker 
has good ev^uatlon data— and uses it. This suggests a certain innocence on 
tile part of the le^slators about how to really eflfect educational Improvements, 
It is not very useful to woriy only about whetiier the decisions being made 
are helpful, hurtful, or of no consequence when equ^ eonceru should be 
shown for how to Improve those decisions. Legislatures and school systems 
should not become so preoceupied with -the outcomes of the decision making 
process that they fall to solve problems those outcomes mlg^t reveal* It 
seems advisable in thiB conwKt to use evaluation as Stufflet^am has proposed, 
as a mechanism for administrators' use In Judging decision alternatlvas 
to help them make better informed decisions* If evaluation Is used effeqtlvely 
In this wayT^ lpoking at the results of decision making should reveal fewer 
decisions that have affected educational programs negatively or not at all. 

Beyond Its relationship to decision ^ma^ng, there are two additional - 
dimensions of evaluation which are Interrelated and should be discussed 
together. These are formative vs, summative evaluation and Internal vs. 
external ev^uatlon. 

Formative aJttd Summative Evaluation 

Scriven (1967) first distinguished between formative and summative . 
evaluation. Since then, the terms have become almost uniyersal in their 

7 ite 1971 Colorado Accountability Act is discussed throu^out this paper 
as an example jof public acto accountabiUty leglslatlgn^^^ However, ^because of ; 
basic similarities in the Colorado law and accountability legislation in other 
statoSp the discussion herein can be generalized to many of the ottier states 
where school personnel are faced with the task of implementing new accounta- 
..Mllty : laws, ...... 



uie In the field. Although in practice distinctions between these two ^pen ot 
evaluation may blur somewhat, since they are not strictly orthogonal , j it seems 
Uiaful to summarize the major differences noted by Scrlven, even at the risk 
of some oversimplification. 

Formative evaluation simply refers to evaluation that Is oohdueted during 
the operation of a program for the a^qpress puipoee of providing evaluative 
information to program directors for their use in Improving the .program. 8 for 
eKample, during tha development of a curriculum package t formative evaluation 
would Involve content inspection by e^^ertSj pilot tests with small numbers of 
children, field tests with larger numbers of children and tsaehara in several 
schools* and so forth. Each of ^ese steps would result In immediate feedback 
to the developers who would use the information to m^e neca(isaiy revisions 
in the materials* ^ > ^ 

Summatlve evaluation Is evaluation conducted at the end of a program for 
the expTess puipose of judging the worth or effeotivanese of ttat program for 
potential users for whom it has been developed. For example^ after the 
curriculum package is completely developed, a summatlve ev^uatlon might be 
conducted to determine how effective the package Is wltii a national sample of 
typical schools, teacherst and students at the level for which it was developed* 
Note that the audience here is very different* to formative evaluation * the 
audience for the evaluation report comprises personaal in the program — in our 
example, those who were rasponsible for developing the curriculum package* 
In summatlve evaluation, the audiences for the ev^uatlon report include the 
potential users (studmits, teachers, and otter professionals) and the souirca 
of funding (ta^ayar or funding agency), as wall as program personneL 

1 The discussio^^ is intended to apply 

equally to evaluation of educational programs, projects, products, and 
processaB--*indeedj any object of an educational evaluation, Howevers ;to 
avoid tedious redund^cy,- only one term (e, g, , "program") will gener^y be 

objects of educational evaluation; can be assumed to Inoiuded^ Impiicatlonv 



Vi Program development decisions and accountabm^ 
respectively on formative and summatlve ev^uatlon* Formative evaluation 
leads to (or should lead to) decisione about program development (including 
modification, termination, continuation^ mid the like), Summatlve evaluation 
is one of the neeeBeaiy steps in making accountablli^ decisiona. The 1972 
Colorado accountabili^ lepBlation (and many other state accountabllily laws 
as well) deals primarily with summatlve evaluation and emphaslzee formative 
evaluation little if at alL ITiis is unfortunate i not because summatlve avalu*- 
ation is unimportaLnt— -no rlght-^thltddng educator could take that st^d— but 
because without formative evaluation It is incomplete r^d Ineflflclent* Co;!-' 
eider the foolishness of developing a new design for eji aircraft ^d submitting 
it to a "summative" test flight without first testing it in the windtujmeL The 
probable success of premature summative evaluations in education aeems little 
■ ■■ greater, .. 

Internal and External Evaluation 

The dichotomy of internal vs. external evaluation Is largily self-- 
explanatory. The adjectives refer to whether the evaluator is ^ta^ 
an employee of) or exter^l to tiie program being eviluated. A TlUe HI 
program might be evaluated by an ©valuator who is a member of the project 
staff (intern^) or by a site visit team appointed by the State Depjjtment of 
Education (external). There are obvious advantages and disadvantages with ■ 
both of ttese roles. Thm internal staff evaluator is almost certain to know 
more about the project than Is possible for any outsider, but he may also 
be so close to the project that he is unable to be completely objective in his 
\. view of it. I^ere is seldom as much reason to question the objectivity of _ 
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ttie exterh^ evaluator (unless he is found to have a particular ok to grind) 
and. tWs dispassionate perspective Is perhaps his greatest asset, Conyersely, 
it is dlfflcult for an external evaluator to ever learn as much about the project 
as the insider knows. Note ^at when I say as much, I refer only to quMitlty, 
not qualiftr. One often flnds^ an internal evaluator who is fidl of Unimportant 
details about the project but overlooks several critical variables. If these 
bits of key Information are picked up by the external ev^uator, as Is some-* 
times the case, he may end up Imowlng much less overaLl about the project 
but knowLug much more of Importance* . 

Possible Role Combinations ^ 

ThB dimensions of formative and summatlve evaluation can be combined 
with the dimensions of internal and CKtern^ ev^uation to form the two-by-two 
matrix shown In Figure 1, 





toternal 


External 


Formative 


1 ' 


2 " -- . 


Summatlve 


3 


■ - A 



Figure t\ Oombinatlon of Evaluation Boles 

The most common roles in evaluation ml^t be indicated by cells one 
and four In the matrix. Formative evaluation is ^^pically conducted by an 
internal evaluator. His knowledge of the program is of great value here and 
possible lack of objectlvify is not nearly the problem it would be in a summatlve 
eyaluatlon* Suirnm^tlve evaluations M^e iypicaUy (and probably best) conducted by 



external evaluators* It is difficult, for example, to know how much credi- 
bility to accord a SEA evaluation which concludes that a set of reading 
materials is far better toan its competitors^ 

Another Important role---*tihat of the external formative evaluator shown 
in cell two — ^is almost completely neglected in educational evaluations. As 
Implied earlier, the Internal ev^uator may share many of the perspectives 
and bllndspots of the rest of the program, staff imdj coniequentlyi Mglect 
even to entertain some niegatlve questions about the programp The eKternal 
evaluator doesn't have long familiarity with the program to fall back on and 
he is much less likely to be influenced by a priori perceptions ttat It is 
basically good. This is not synonomous with saying that he is predisposed 
toward judging the program as bad. His orientation should be neither positive 
nor negative, oidy neutral and uninfluenced by close associations either 
with the program or Its competitors, to essence, the external formative 
evduator introduces a coldp hard look of reality into the evaluation relatively 
early— in a sense a preview of what a summatlve evaluator migjit say* This 
fresh outside perspective is Import^t, even If used Infrequenttyi to avoid the 
frequent disastar that occurs when a program staff carefiflly and self- 
consciously conducts a formative evaluation of their own program, using 
criteria and variables fliey inte^ret as proving the program successful, 
only to have ^ outside agency (a school board or site visit team) close the ^ 
program, down because their summatlve evaluation focused on o^er variables 
or used different criteria which resulted In over^l ne^tlve outcomes* Wisdom 
woidd dictate tte use of an outside formative evaluator as part of the formative 
ev^uation of eveiy program or product. 



Cell three In Figure 1, the internal summative evaluator, strikes me 
as a role , that is only infrequently appropriate. As stated earlier, the summa- 
tive ev^uatlon is generally best conduGted by an external evaluator or agency* 
However, in some Instances there is simply no posslbiU^ of obtaininf suc^ 
external help became of finanolal conetraints or absence of competent person* 
nel willing to do the job* In these eases, the summatlve evaluation Is weakened 
by the lack of outside perspectivev but It might be possible to retain adequate 
objeotivlty a^d credibility by choosing the internal summative evaluator from 
among those who are some distance removed from the actual development 
of the program or product being evaluated. 

The concepts discussed so far lead to consideration of ^6 radically 
different approaches to program evaluation— ^gool directed ev^uation and 
goal*free evaluation. Each of those approaches is described briefly below. 

Goal-Diracted Evaluation 

Go^ -Directed Evaluation is perhaps the most common ^e of 

evaluation practiced in education. It is basically the approach first suggested 

Igr Ralph Tyler as early as the ISSOs and reiterated in his statement quoted 

earlier, ms approach has been adopted and e^^anded by tte numerous 

advocates of the use of behavior^ objectives. In essence, it depends on 

six basic steps: 

1, EsteblisMng broad goals for toe program. 

2# For each broad goal, identifying specific objectives which, if 
attained j would result in attainment of the goal, 

3, Stating each objective in measurable terms. 

4, Developing or selecting measures of performance (ususdly pupll>^] 
performance) for each objective. ^ 



5, Conducting the program which is to attain the objectlvei. 

6, At tile end of the program, measuring performance on each 
objective to see if ejected outcomes have been achieved. 

The third step provldei it© geneils for bBhavior^ objectives* As an 
aside to the discussion of goal ^-directed evriuatlon, I am uncomfortable with the 
our rent fanatlolsm about behavior^ objectives which ieems to have permeated 
the field of education* It is obviously time that educational goals and objec- 
tives are made more e?^llclt ^d observable Md, as PophijaoL stressed in his 
statement quoted aarlierj objectives obviously should be stated In terms in which 
they can be assessed. But one cajmot help being distressed by the mlndlessness 
rumiing ramp^t throu^ education which would have all educators state 
evexy Intant — ^however trlvial-^^ln behavior^ terms. In some schools, the / 
staffs are spending so much time and energy stating everything they WEint to 
teach in behavioral terms tiiat they hardly have time to teach. I ain frmiMy 
un^mpathetic with the zealous efforts to train every teacher to use a recipe 
to translate every aspiration into a behavioral objectivep ITils is especial^ 
true where teachers are used to write objectives Intended more for evaluation 
than for instructional purposes. It isj after all^ ^e ev^uator who Is 
supposedly skilled in ^e language of operationaliEation* I think the evaluator 
should take the following stance in working with pi'Ogjf^m personnel: '*Give: 
me BB. objective in any form. Just so I understMd whgt your intent Is. As 
an evaluator^ I will translate your objective into behavJOTal tarms and have 
you review my statement to make certain I have not distorted your intenf 
That makes more sense to me than trying to train all educators to be 
evaluators. He pendulum obviously needed to move from the ir responsibly - 
soft-headed position that educators do not need objectives because, after all. 



they "ktiow in their hearts tihey are right. " But education swung too far to the 
other extreme when it spawned tiie religion of behaviorism kmd^e zealots who 
app^ it uninteUi^ntly. One can hardly oppose operationalizing Instructional 
objectiveg and asseesing their attainment, but Urn lev© 1 of reduotion the 
utility of scores or hundreds of objectives for each area of endeavor should 
be questioned. It is simply far too much of a good thing* If not tempered 
with reason^ the press for tehavioral reduotlonlsm seems likely to backfire 
by disenchanting educators wltii all objectives — a result which would cripple 
educational evaluations p 

Goal-Free Evaluation 

■ - ■ * 

Goid-Free Evaluation has been recently introduced to the field of evaluation 
by Scriven (1972), The rationale for goal-free evaluation o^ summarized 
briefly as follows. Firsts educational goals should not be taken as given; 
they like anything else should be evaluated. Further/ goals are generally 
little more than rhetoric which seldom reveals tiie ra^ objectives of the 
project* In addition^ many important outcomes of a program do not' faU 
In the category of goals or objectives anyway (e.g., a Title IH project will 
create additional Jobs~a desirable outcomes but never an explicit go^ of a 
Titile TO. project). The most importMit reason for proposmg goal-=free evaluation, 
howeverp Is the salutary efifect It will have on reducing bias SLad increasing objec-- 
tlvlty in ev^uation* la goal-^dlrected evaluation^ an evaluator who is told the 
goals of the project is Immediately limited In his perceptlons^the goals provide "^ 
a set of blinders which causes him to miss important outcomes of the program 
which are not directly related to the goals (side effects, as they are known in medi- 
: v ^ oal parl^ce)* For example, an evaluator who Is told that the goals of a dropout 
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rehabilitation program are to (1) bring dropouts back into school, (2) train 
them in productive vocationSp and (3) place them in stable jobs may spend 
all of his time designing ^d applying measures to look at things such as how 
many dropouti have been recruited back into sehoolj how many have been 
placed and continued In paying jobs, and so forth, AH to the good— and the 
program may be succegsful on ^1 tihese counts* But what about the fact that 
the crime rate of o&er (non--dropout) children in the high school has trebled 
since tiie dropouts were brought back into the school? Indeedi a hidden 
curric^um In stripping cars seems to have iprung up! His is a negative 
aide effect which is much more likely to be picked up by the goal-free evriua- 
tor than 1^ the goal^irected ev^uator who has his built-in blinders Imposed 
by his knowledge of the objectives, , 

Such a brief summa3^ hardly does Justice to Scriven^s coneept of goal^ 
free evaluation, but it at least provides an introduction to an interesting new 
approach that is getting a lot of attention in^e field of evaluation. 

It might be helpful to point out that goal -directed and goal-free evaluation 
are not mutu^ly aKcluslve activities* Jtadeedj they supplement one anotiier vaiy 
well* The internal staff evaluator of necessity conducts a gq^-directed evaluation* 
He can hardly hope to avoid knowing the go^s of the program ud it would be unwise 
to ignore them even U he could. Program directors obviously need to Imow how well 
the program is meeting its goalsj and the Internal evaluator uses goal-directed 
evaluation to provide him with that information* At the same time, it is important 
to know how others judge ttie program, not on the basis of how well it does what is 
Is supposed to dO"but on the basis of what it does in all areas, on all its outcomes. 
Intended or not, TMs is the task for the extern^ goal-free evaluator who knows 
^nolhlng.of the pro5ram^goals*_ So_lt Isn^t aither-or.__Both goalniirected evaluatipn^_ 



and goal-free evaluation can work well together. Even Scrlven agreei that the major 
share of &e evaluation resouroes ihould go to goal*-direeted fornaative and lummatiy^ 
evaluations. What is tragic is when all the resources go to goal-directed evaluation 
on a program where the goals do not even begin to include the important outcomes 
of deprogram. ^ v 

Coinparatlve vs, NQn-Comparatlve Evaluation 

Anottier conslda ration in evaluation which ^^h relevance to the 

preceding diBCussion is that of comparative evaluation vs. non-comparative 
(or single program) evaluation. There is a long literature on tihls topic 
wMch could not even be listed j let alone eummarized here, but the issue 
ORh be encapsulated briefly as followi. Cbmparatlve evaluations are those - 
where two or more programs or meffiods are compared witti one another on * 
common criteria* For example, assume a public school system is planning 
to establish an elementary l^guage program In Spanish* All but t«?o sets of 
curriculum materials have been excluded on the basis of considerations such 
as cost* guaranteed availabllily of replacement materials i and tiie lUce. to 
addition to the two sets of printed student and teacher materials, some of the 
teachers have expressed enthusiasm for a new conversational approach to - 
teaching Spanish which uses no written materials for students but depends 
eKclusively on in-class conversations* A domparatlve ev^uation might be 
designed to involve a random sample of six elemental^ schools In the district, 
witii each of the three approaches randomly assigned to two of those schools 
for use there as tile exclusive treatment* Hie outcomes of the three curricu- ' 
lum approaches could be compared on criteria such as students' conversational 
abili^ in Sp^ish and students- ability to read Spanish, and a judgment made 
as to wluch approach is best on these criteria. Obviously, this example / ; 



is oversimplified since it ignores treatment«aptitude Interactions, weighting 
of criteria^ and tiie like, but it should serve to illustrate the point. 

Non-comparative or single-progr^ evaluations obviouely lack any compari- 
son group. The focus of these evaluations is internal and generally is built on 
a go^=directed approach. Single-program ev^uatlona are the most common 
type of evaluation conducted in education today. In the previous example, 
the school eyatem would make a decision on some relevant basis (e.g., 
reputation of publisher or cost) to try a particular approach to teaching 
^Mish, Objectives for the program would be carefully noted, the program 
would be implemented, Mid, after it had run its course, measures would be 
applied to see if it had attained its objectives. In short, the basis for judging 
success woidd lie not in comparing the program with any other, but in an 
Internal check for discrepancies between what the program purports to do and 
what It^really^doeg. ^ — "^"^ "~ - - - " — — - - — 

Both the comparative and single-program ayaluation paradlpns are well 
entrenched in education and there is no unequivocal answer as to which is 
best for all evaluations. It obviously depends on tiie questions to be Mswered 
and the resources available for the ev^uation, to mention oiil^ ume of the 
daterminaats. For example, if one is evaluating three houfle plans offered by 
a particular builder, the problem can be approached as a single program 
evaluation. Assume one criterion is "convenience," und wa. elimination of 
Plan A revealed ttat the single inconvenient feature was the Qeoessity of 
crossing an open hallway to carry food from the kitchen to the dining room, 
^ far, so good/ Plan A comes very close overall to meeting its goal of 
convenience. But should one choose it? That all depends on the other ^o 



pl^s in the same price range. Does it have bearing on the decision If we 
find thgt Plan B has no inconvenient features and Plan C requires that one go 
toou^ tte bathroom to reach the kitchen? Of course, ae I can testify from 
living briefly in a rental unit the Plan C feature* point of tiWs 

example is that without looking at Plans B and C, one would never know what 
he had saleeted (or rejected) witia Plan A, For ihis reason, I tend to view 
comparative evaluation as ^e ultlmato in ev^uating education^ programs ^ 
since it allows you not only to know what you gain by choosing a particular 
program or method, but ^so what you ^ve up by rejecting otoer alternatives. 
Numerous administrators think they w^t to know whetter a particular program 
does any good. What they really should be asking is what benefit the program 
produces, at what cost, and compared .wi& the benefits produced by other 
alternativeo with similar costs. Obviously, there are numerous occasions 

^Keh^omparatlW^vs^ — 
questions posed. Unfortunately, many appropriate opportunities to conduct 
co.iiparatlve evaluations are lost ^cause many educators view "comparative 
e^^rlments" as useless or even harmful. This perception probably stems from 
instances where ttey have seen comparisons conducted nnintelllgently by Uiose 
who evidenced in their designs sophomorlc misunderstandings of the methodolosr 
and its appropriate application. Hopefully this perception can ba changed as 
evaluators learn when Euid how to structure idternatives in ways &at can be 

demonstrated to have utillQr for decision m^ers. 
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An Analysis of Eviuluatlon Characterlgtlae of One Accou^tabUity' 

Enough abetraot concepte have been presented ttat it may be helpful 
to examine them in a real conteJrt, Describing and analyzing the evaluation 
characteristics of one accountability law, the 1971 Colorado Accountability 
Act, may be instructive, especially so since aocountability le^slatlon 
In many other states contain similar characteristics. 

I suggested earlier that the Colorado legislation includes a veiy narrow 
view of how evaluation information might be used in decision making and noted 
that looking only at the impact of the decision was ^dn to locking the barn 
after the horse had escaped* It is unfortunate the focus of the law was not 
at least as great on how evaluation could be used to improve the quality of 
decisions and, in turn, their utility. 

In writing accountability le^slatlon, many lawmakers seem tuiclear as 
to-whether—their— real^interest— is-simply— in-^dlsclos 

formative evaluation, or some combination. The activities mandated by 
the Colorado Accountability Act are primarily summatlve. On the surface, 
this seems eminently reasonable, since accounting for the benefits derived 
from large eKpendltures of public funds is essentially a summatlve activity. 
However, the language of the act and the discussions surrounding its passage 
make It clear that the le^jlators were also interested in forcing educators to 
collect Information for immediate use in improving the quality of education. 
Yet, there is no real provision for the regular use of evaluative data to 
improve a school' s progrMa except at the end of each armual cycle . liiat 
tardy schedule is hardly congruent with the intent of providing feedback for 
program improvement. No one would deny the need for summative evaluation 
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information about Colorado schools, but it is unfortunate that the legislation 
fails to place any e^liclt emphasis on formative uses of evaluation for program 
Improvement slnee this was one apparent Intarest of &e legislators. 

A more serious problem is toat the summatlve evaluations which will 
result from the application of tte Colorado formula will not be very satis- 
factoiy eittier. Hie law emphasises Internal evaluation within each district 

l^oras external evaluation of programs m tiiose districts* TMm would be 
more understandable if the act focused on formative evaluation. Earlier in 
this paper, Internal summatlve evaluation (cell three in Rgure 1) was presented 
as the weakest case and the most difficult to Implement wltiiout bias* Yet ^s 
is the cell in whloh the Colorado law seems to flt best. Probably few lawmakers 
would condone the praetloa of asking banks to conduct tiielr own audits* or 
asking pharmaceutlesd complies whether one of their drugs should go on the 
^market*^_yet_.toauQolQrado„assembly_haa_^ 

summatlve evaluations of how well they are living up to the promises tiiey 
made and to report their conclusions baek to tte legislature, with the clear 
implication that the ijaformation will Influence future allocation of funds* Perhaps 
the lack of a profit motive in public education is a telling factor, or perhaps 
educators are simply more trus^orthy than bankers and pharmaceutical 
researohers. Conversely, one could speculate tiiaf educators have no special 
moral eminence, especially when faced with accountability mandates which many 
perceive as imreasonable and punitive* In such a context. It may be grosslj^ 
unfair to educators to ask them to cariy sole responsibility for evsduatlng their 
attainments, Itieluslon of external summatlve ev^uators would largely eliminate 
the conflict of interest in whleh many Colorado schoolmen feel they have been 
placed by the accountability act. 30 



The present Colorado law is also a classic example of goal--direeted 
evaluation, Thm General Asoembly mandated that each school district should report 
once a year to its local constituency and to the state " . . * the extent to which the 
district has achieved its stated goals and objectives^* (Senate Bill No. 33, 
p. 4). Apparently all a district has to do is state some general goals and 
specific objectives, carry on their program for a year, and at the end of 
that time report how well it has done on those goals and objectivee. Although 
it often is important to know whether or not a district attained its stated 
objectives, such is not always the case. It depends largely on the prior 
question of whether the goals were worth attaining in the first place. Some 
goals that are attainable are hardly worth the effort, A more serious problem 

a 

occurs when all goals were attained not because the program was affective 
but because the goals were set too low or had already been attained in part 

through other means. What in the Colorado law is to keep insecure districts 

^ ^ -^^ ^- . — ■■ — - - — = — - ' - ' — — " " — ^•^-^ — ^ 

from setting goals safely low, or overly ambitious districts from setting 
goals impossibly high? The first could be applauded and the second censured 
by lawmsJcers without the qualify or effectiveness of their educational programs 
really entering Into the Judgments. This is not to argue a^lnst local goal 
setting, but only to point out that statewide accountabilily systems might better 
depend on assessing and disclosing outcomes on "minimum essentials" which 
should operate in all schools than asking each district to develop local goals 
and measure their attainment in a way which defers meaningful Interpretation 
of the results* The situation is almost ^alogous to that in which one needs 
to identify which children In a classroom are in good health and which are 
suffering from malnutrition, and height is considered a relevant Indicator. 



There would be at least a measure of foolishness in asking each child to 
make his own tape measure, use it in measuring his height, and then report 
how well he has attained the hei^t he desires to reach at that point or 
whether he is too tall or too short for hie age. 

If it Is not already patentiy clears I am not enamored with the 
Colorado accountability act as a model of accountability legislation. Even 
if a district followed it in eveiy detail md specificationp the resulting system 
would fall to qualify as either a good evaluation or accountability system. 
Perhaps one , cannot re^ly expect a le^slative assembly to write adequate 
technical legislation In education and should not be discouraged by such 
failures. It would seem more productive to focus on the obvious Intent of 
the act. The Colorado law is clearly Intended to force school districts 
to think about a^d articulate what they want to do Bnd to assess the effec^ 
tlveness of what they attem pt. iTie Gener^ Assembly obviously wants scho ol- 
men to look at where tlitiir decisions lead them and try to improve schooling 
as a result. Rather than criticizing legislators because they exhibit some 
naivete about evaluation (an innocence shared by mnny persons in education) » 
educators could better fulfill their role as responsible professionals by 
attempting to implement the intent of the legislation* To do so would serve 
in the best IntereBts of each school district, especially if educators saw 
the advaatages of ''piggybacking" the development of a sound ev^uation 
system onto the need to meet le^l requirements. Considerable time will 
be demanded on the part of schoolmen to meet the minimum "accountability" 
requirements, and the result could still be an Inadequate e venation system. 
With some refocusing and a modest increase in time spent, a fully functioning 



evaluation system could be developed, buhools could profit greatly if the 
impetus provided by the lepslation could be used as an opportunity to develop 
a good evaluation system, even though it doubtlessly me^s exceeding tee 
minimum essentials deBcribed by la^* 

Characteristics of Good Evriuations 

What I have presented so far Implies that there are good evaluation 
systems and bad evaluatloa systems and touchstones to enable educators 
to tell one from the other. There are some basic components -which in my 
opinion should be included in any evaluation. Some of these have been 
suggested explicitly or ImpllDitly in writings of Scrivan (1967)j Stdce (1967, 
1970) and Stufflebeam (1968), while some of the proposals originate with my 
views on evaluation. The result is a checklist of general characteristics of 
good evaluations which any school could use to determin© whether its evaluation 
plan includes such importaht considerations* 
Is Conceptual Clarlft r 

Conceptual clarlfyi ^ essential featare of any good evaluation plan, 
refers to whather or not the ev^uator exhibits a clear understanding of the 
particular evaluation he is proposing. Is he planning a formative or summatlve 
evaluation? Is It a eomparatlve evaluation design or a single program evaluation ? 
Is the evgduatlon to be goal^lrected, with the design built around the measure- 
ment ol attainment of specific objectives, or goal-^free with the desipi built 
around lists of evaluative questions generated independently of the go^s? 
Answers to questions such as these should be apparent in any good evaluation 
plan. Without clarity on these points I it would be an accident if the remainder 
of the evaluation were anything but a muddle. _ _ . 
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2, Characterlzatioa of Oblect of the Evaluatioii 

No ev^uatlon is oomplete unless It includes a thoroughj detailed des- 
cription of ttie program or phanomenon being evaluated* Without sueh 
oharaaterization, judgmente may be drawn about entities wUch never reall5r 
existed.® For example, the concept of team teaching hae fared poorly in 
severe evaluationBs resulting in a general impreBiion that team teaching 
is ineffective. Closer inspection shows that many methods labeled as team 
teaching provided no real opportunities for st^fs to plan together or work 
togetor in direct instruction. Obviously » a better description of tiie 
phenomenon would have avoided Uiese miilnte^retations completely. One 
simply oannot evaluate adequately that which he cannot describe accurately. 

3, Recognition and Bepresentation of Legitimate Audiences 

An evaluation is adequate only If it Includea input from and reporting 
to all legitimate audiences for the ev^uatlon. An evaluation of a school 
program which answers only the questions of the school staff and ignores 
questions of parents, children, Md community groups is simp^ a bad 
ev^uation* Emh legitimate audience must be identified the evaluation 
plan should include their objectives or evaluative questions in determining 
what data must be collected. Obviously, some audiences will be more 
important ttian others and some weighting of their Input might be necessa^. 
Correspondingly, the evaluation pl^ should provide for receipt of appropriate 
evaluative information by each audience which has a direct interest in the 
program.' 

i 9 Charters and jbnes (1973) have claimed that such appraise of "non- 
events'' Is frequent in program ev^uatlon. However, tiefr failure to present 
empirical evidence for their claims led Murray (1974) ^to wag^sUy suggest 
that critiquing the Chariers-Jones paper might be evaluating a non-event, 
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4. fensltivi^ to Polltloal Problems In Evaluation 

Mi^y a good evaluation , unimpeaGhable in all tichnic^ detaile, has 
failed baoause of its political nalvetl. It is pointless to promise to collect 
sensitive data (e^g, , princip^s' ratings ol teachers) wl^out first obtaining 
permission from the office or individual ^ho controls those data* /Agreements 
must be reached early in any ev^uation about issues such as access to data 
and data sources and safeguards against misuse of evaluation data* Steps 
must be taken to guarantee that program staff have opporttinities to correct 
factual errors In ev^uation reports without compromlBing the ev^uatlon Itself. 
These issues SKlst in ^most every evaluation ^d the more es^llcitty they are 
dealt v^ittXi the more likely the evaluation is to survive political pressures^ 
5* Specification of Diformation Needs aM Sources 

Good evaluators tend to develop and follow a blueprint which teUs them 
precisely what information they need to collect and what tiie sources of that 
information are. At the very least* tiiey know how (as Serlven puts It) to lay 
snares at critical points to the game trails^ Conversely, the novice evaluator 
goes about randomly turning over stones or Iteating the brush to see what he 
can find. No evaluation can depend on a rMdom, scattered '^here a lltUe, 
there a little" approach to collecting data. An adequate evaluation plan 
specifies at the outset the info miatlon which must be collected* the evalu- 
ation is goal -directed, tiie plan will specify IMormation ^at will help to 
determine whether the objectives were attained. If the evaluation is built 
around evaluative questions (of the -'What would you need to know to decide 
whether the program was a success or a failure?" varieft^), the evaluation 
plan should specify information which, . when collected, will answer those 
questions. And in eve i^ case, listing of the needed iJJormatlon leads 
logically to identification of the sources from wWch that in^ ceui be 
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oDtamea. , * aiiure to attend to tbeie eeemmgly padeetrian out truly ermcai 
steps is one of the greatest single reasons so many evaluations produce little 

useful information. 

6. C ompreheas 1 vene g e/Jtoclusl vane s a 

This categoiy is really aji elaboration of the previoUiS one. No evalu- 
ation can hope to collect all of the relev^t data, nor would it be desirable 
to do so, since there will always be inooneequentlal aad trivial data not worth 
the bother to collect. Collecting too much data is seldom toe concern, however. 
The greater problem is collecting enou^ data~Qr more precisely, collecting 
data on enough important variables to be certain one has Ineluded in the 
evaluation all the major considerations which are relevant. A good evaluation 
includes all of the main effects, but also includes provisions for remaining 
^ert to un^tlcipated side effects* A good comparative evaluation doesn't 
stop with comparing the experimental arithmetic program with a control group 
which receives no arithmetic instruction. It goes on to identify the critical 
competltors----SMSG math, Culsemiaire Hods, and so forth"Md compares their 
new program with those for which costs are roughly comparable* In shorti 
the weak evaluation is ^most always characterized by a narrow range of 
variables and omission of several which are important. The wider the range 
and the more import^t the variables included in the evaluation, the better 
it generally ISi 
7* Tachnic^ Adequacy 

More evaluations fail here on almost any othier dimension, and ttls 
is due to the scarcity of educational a valuators who are even mar^nally compe- 
tent in tectaloal areas. Good evaluations are dependent on construction or 
selection of adequate Instruments, the development of adequate sampling plans* 
and the correct choice and application of techniques for data reduction and 



analysis. Volumes have been written on education^ meaiurementi stapling, 
and statistics and it would be pointless to txy^ to review ftat Imowled^ here. 
Suffice it to say that these areas are ^ eesentlal to most evaluations. Wiliiout 
knowledge and oontrol of ttes^^ tools of Ms trade, the evaluator has little hope 
of producing evaluation information which meetB scientific criteria of v^idity, 
reliabUl^f and objectivity* 
8. Consideration of Costs 

Educators are not econometrleians and should not be e^^ected to be 
skilled in Identifying all the finiytteial, humane or time costs associated wltih 
programs they operate. Tliat bit of leniency cannot be extended to the 
evaluator, however, for it is his job to bring these factors to the attention 
of developers, teachers and administrators who are responsible for their 
products or programs. Educators are often faulted for choosing the most 
expensive program from too that are equally effective , just because the 
expensive one is packaged more attractively or has been more widefy 
advertised. The real fault lies with the evaluations of those programs which 
fall to consider cost factors along with the other variables. As any Insl^tful 
administrator knows, costs are not Irrelevwti and It is Important for him to 
know how much Program X will accomplish aund at what cost, so he may know 
what he is gaining or. giving up in looking at other alternatives which rs^^ in 
both cost and effectiveness. 
9* Explicit Standards/Criteria 

It Is always a bit disconcerting to read through an evaluation report and 

be unable to find anywhere a statement of the criteria or standards which were 

.. _ . , . .... • ■ . ■ . * 

used to determine whether the program was a success or a failure. The 



measurements and observations taken in an ev^uation cannot be translated 
into Judgments of worth without the applieatlon of standards or criteria. Is 
wa. in-Benrice program for teacheri succeeiful if 75 perGent of the teachers 
attend? That all depends on the rationale for the program and the attendance 
standard that would si^al succees or failure. What about a TO percent 
attend^ce rate In a high school mathematics elass"is that good or bad? 
Again, it depends on the standard. If it is a college preparato^ class with 
high attendance e^qpeetationi—Bay a standard of 95 percent~TO percent is ve^ 
poor* If it is a remedial mathematics class for dropouts who are returning 
to school on a part-time basis, the eKpectation mi^t be considerably lower™ 
say 50 percent—and the attendance rate of 70 percent might be quite aoceptablet 
'^ese examples Qverstmpltfy the concepts but hopefully they will not detract 
from the point thai {wary gocf, evaluation will include a statement of standards 
and criteria,. 

10. Jud giy):^at e i d/or Becoinmendations 

The only reason for insisting on e:^liclt standards or criteria is that 
they are the stuff of wh! oh Judgments and recommendations are made, and 

the latter are the sine qua non of evaluation* An evaluator's responsibility does 
not end with the collectioii, analysis, and reporting of data. The data do not 
speak for themselves* The evaluator who knows those data well is in the 
best position to apply the standards to the data to reach a judgment of whether 
the program is effective or inetfective, valuable or worthless. Making Judge- 
ments ^d roc^kTimendations Is an essential part of the evaluator* s Job. Am 
evaluation without these ingredients is as much an indictment of its author^s 
sopbistlcatuni Rr one with recommendations that are not based on the data* 



11. Beports Tailored to Audiences 

It was maintained earlier ttat there are multiple audlenGea lor most 
evaluations and these audiences have different Information needs. For example, 
when an ev^uator completes an evaluation, his methodolofically oriented 
colleagues will be Interested in a complete^ detailed report of the data collection 
proeedureSj analysis techniquesj and the likep Not so for the sohool boards or 
the PTA or the chairman of the local ta^ayer group. These audleneas do not 
share the evaluator*s grasp of technical details, his interest in test reliabllify 
and validi^i or his concern over the appropriate choice of an error term in a 
randomized blocks design. The evaluator will have to tailor reports for these 
groups so that they depend on non-technioal language and avoid over-use of 
tabular presentation of data ana^ses. A' ^ic^ evaluation might end up with 
one onwibus technical evaluation report which self-^eonsolously Includes all the 
details and one or more non-technical evaluation reports aimed at the Importejit 
audience (s). , ™ - ~ ~ - . 

Another notion should be inserted here as well^^that of interim or even 
continual reporting of evaluation findings* Timeliness is an Important concern 
in evaluation, Biformation that is presented too late to affect the decision for 
which it is relevant is useless. Good evaluations will not depend solely on the 
printed word, but will Include a variety of report formats, including --hot-line" 
telephone reporting, so the Inforaaatlon is reported whenever it is needed to 
make a particular decision. 

Conclusion 

This paper was written for educators with little training or experience 
In formal ©valuation of educational programs, products, or processes, to it. 



I have attempted to provide a brief overview of a number of importaiit con- 
siderations In educational eviduation and accountability, ^ecifieally, the 
following topics have been presentedi (1) simple illugtrationi of differences 
in evaluation^ research, aasesement, measuremant, and accountability; 
(2) a discussion of some basic evaluation constructs; (3) aia analysis of the 
evaluation features of one accountability lawi and (4) general touchstones for 
judpng the adequacy of an evaluation* Obviously, this sampling negleots 
many areas wid results in an Incompleta, oversimplified portrayal ^of the 
field. Hopefully, it will prove useftil either as an evaluation primer for 
be^nners or as a guide to sources of information for &eir flir&er stu^. 
If of the contents prove Informative for more ej^erlenced evad^StoTa, 
that would have to be viewed^ in the current Idlon, as an "unanticig*^^!^^^^ 
side effect, " 
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